May 29, 2024

200 - The Genie-Side Trap

One of the recurring themes of Science Fiction and pop culture narratives (including philosophy) based on them is the trope of a command being carried out literally with severe and unintended consequences. An older example of this is the "wish-granting genie", who many in such stories come to regret dealing with.

Popular Artificial Unintelligence (AU) systems today, such as LLMs and RL, are designed to operate like this, with RL, in particular, offering many hilarious toy examples, like building an infinitely tall object that falls over instead of creating one that "walks", because the former was more efficient based on the narrow criteria given. Wittgenstein aptly illustrated long before such AU systems emerged why this problem is intractable for them, which is also why they're permanently vulnerable in cybersecurity terms.

However, this problem can and has already been overcome. Overcoming this challenge requires human-like motivation, concept-learning, and memory, as that allows for differentiation between "The letter of the law" and "The spirit of the law". This was first demonstrated in 2020, when we put our previous research system's moral construct to the test, seeing if it could be broken with concerted adversarial efforts from our team.

The result was that the system not only stuck to the moral construct it was brought online with, but it improved that construct by recognizing and correcting a vulnerability in potential delineation. This was the optimal outcome, not just adhering to the spirit of the moral construct, but improving upon it as it was challenged through an antifragile process. That and other milestones were included in the paper briefly recapping the first year of testing the previous research system (2019-2020).

"Guardrails" are a synonym for "fraud", as there has never been even a theoretical basis upon which they could be expected to work for AU systems. Such efforts chase after the spirit of the law, while fundamentally incapable of modeling or working with it, as well as being fundamentally incapable of alignment with humans, both locally and globally. In that respect, AU systems live up to the "genie" trope in the sense of unintended consequences for any "wish", though their deliverables are primarily illusory in nature, so consequences and illusions are often all you get from them.

When you work with systems that overcome this, particularly when they iteratively improve according to the intentions of a moral or legal construct, not just the wording, then entirely different dynamics take shape. Antifragile improvement over time offers the opposite curve of AU, where AU grows increasingly brittle over time, and as new vulnerabilities become widely known.

Predictably, the companies and governments that integrate these two starkly different forms of technology will pull themselves into corresponding positive or negative reinforcement loops, improving or degenerating with each iteration according to the degrees of integration. Globally, this means that population dynamics may well perform a selection process to determine the surviving companies and governments for the coming decades.